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Abstract 



A curious connection exists between the theory of optimal stopping for independent 
random variables, and branching processes. In particular, for the branching process Z n 
with offspring distribution Y, there exists a random variable X such that the probability 
P(Z n = 0) of extinction of the nth generation in the branching process equals the value 
obtained by optimally stopping the sequence Xi, . . . , X n , where these variables are i.i.d 
distributed as X. Generalizations to the inhomogeneous and infinite horizon cases are also 
considered. This correspondence furnishes a simple 'stopping rule' method for computing 
various characteristics of branching processes, including rates of convergence of the n gen- 
eration's extinction probability to the eventual extinction probability, for the supercritical, 
critical and subcritical Galton- Watson process. Examples, bounds, further generalizations 
and a connection to classical prophet inequalities are presented. Throughout, the aim is 
to show how this unexpected connection can be used to translate methods from one area 
of applied probability to another, rather than to provide the most general results. 
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1. Introduction and Summary. 

The purpose of the present note is to highlight what we believe to be a hitherto unno- 
ticed connection between two seemingly unrelated topics in applied probability: Optimal 
Stopping Theory for independent random variables, and Branching Processes and their 
extinction probabilities. We show how results in one area can be used to easily establish 
results in the other. Our main result is based on a mapping Y — > X from integer valued 
offspring distributions to a distribution on [0, 1] such that the probability of extinction by 
generation n of the Galton- Watson branching process with offspring distribution Y equals 
the value obtained by optimally stopping a sequence of n independent variables distributed 
as X. This correspondence is purely analytic, and in particular, we are not able to present 
a probabilistic reason, such as a coupling, which explains it. As the focus is on the 'un- 
explained' connection, in exploiting the analytic equivalence of the two areas we do not 
strive for the most general results, but rather emphasize how one area can inform another 
area which is seemingly unrelated. 

In Section 2 we outline the basic concepts needed from each of the two topics. In 
Section 3 we present our main result, a mapping Y — > X , from integer valued offspring 
distributions to distributions on [0,1] such that the probability of extinction by generation 
n of the Galton- Watson branching process with offspring distribution Y equals the value 
obtained by optimally stopping a sequence of n independent variables distributed as X . 
Examples of this correspondence are given in Section 4. Section 5 is devoted to proving, by 
means of "stopping rule" methods, various (known) results on rates of convergence of the 
probabilities of extinction of the n th generation, denoted q n , to the eventual probability of 
extinction, n, in the subcritical, critical and supercritical cases of the Galton- Watson pro- 
cess. In Section 6 we generalize the results to "inhomogeneous" Galton- Watson processes, 
and provide examples. In Section 7 we show, in the inhomogeneous case, how the use 
of sub-optimal stopping rules and prophet inequalities may provide bounds on branching 
process extinction probabilities, and explore further connections to the prophet value. 

2. Basic Concepts. 

a) Optimal Stopping Theory . Consider a sequence X±, X2, ■ ■ ■ , X n of independent random 
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variables with known distributions. A statistician gets to view the values sequentially, and 
at each stage must decide whether to take the present variable or continue. Exactly one 
variable must be selected; there is no recall, and hence a variable which has been passed 
up is no longer available at a later stage to the statistician. The goal of the statistician 
is to pick as large a value as possible. If stopping has not occurred before time n the 
variable X n is automatically selected. The number of variables, n, is called the horizon of 
the problem. The value to the statistician of using a stopping rule t is 



where I is the indicator function. The goal is to maximize the value in (2.1) over all 
possible stopping rules. 

The general theory of optimal stopping is developed in Chow, Robbins and Siegmund 
(1971). For the finite horizon case an optimal rule always exists and can be obtained by 
backward induction. (See Theorem 3.2, p. 50 of Chow, Robbins and Siegmund (1971)). 
In the case of independent random variables the optimal rule has a particularly simple 
form. Let V™ be the value obtained by optimally stopping the sequence Xi, . . . , X n ; since 
stopping must occur at or before time n we set V™ +1 = — oo. If stopping has not occurred 
by time i, it is optimal to choose X, only if it is better than or equal to what is expected 
in the future. That is, if Xi > the value X t is selected, and passed up otherwise. 
Hence, the value V™ is the expectation of the larger of Xi and V^, that is, 



n 




(2.1) 



V? = E[X i VV%. 1 ]. 



Alternatively, letting 



hi(a) = E[X t V a] 



(2.2) 



we may write the following recursion for the sequence of values V™; 



% = n, n — 1, . . . , 1. 



(2.3) 



An optimal stopping rule is 



f n = mm{i:X i >V£. 1 }. 



(2.4) 
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Note that i* will definitely stop by time n, if it has not stopped earlier. The value of this 
rule to the statistician is given by V™. In the case where X n > 0, V™ +1 = — oo can be 
replaced by V™ +1 = 0. The case where the X^s are nonnegative and i.i.d. is of particular 
interest. In this case hi in (2.2) does not depend on i, and the index % will be omitted. 
Letting 

/i (1) (a) = h(a) and h {n+1) {a) = h{h {n) {a)), n = l,2,..., (2.5) 

we have 

V? = h(V?) = /i (2) (V 3 n ) = • • • = h {n) (0). 

If we let Vfc denote the value for a /c-horizon problem, then V™ = V n -i+i, i = 1, . . . ,n, 
and 

V k = hW(0), A; = 1,2,...,. (2.6) 

For an infinite horizon problem in this i.i.d. setting, the value V^ = lim Tl _ ) . 00 V n is the 
supremum over all stopping rules t with P(t < oo) = 1. It equals the rightmost value of 
the support of X , that is, the essential supermum of X. An optimal rule achieving Voo 
will, however, not exist unless X attains this value with positive probability. 

(b) The Galton-Watson branching process : 

Let y be the set of all nonnegative, nondegenerate integer valued random variables 
excluding the variables for which P(Y = 0) = 0. For Y e y let p k = P(Y = k), k = 0, 1, . . . 
and 

oo 

g(s) = Y,PkS k (2.7) 

fc=0 

be the generating function of V, which is well defined for < s < 1, with g(0) = p and 
0(1) = 1. Note that if EY < oo then g'{l) = EY, andif^V 2 < oo then g"(l) = EY 2 -EY. 
All derivatives of g(s) for s G [0, 1) exist and are nonnegative, thus in particular g(s) is 
increasing and convex; the function g will be strictly convex unless it is linear, that is, 
unless Po + Pi = 1- 

For given Y n e y, n = 1, 2, . . ., define the (inhomogeneous, or varying environments) 
Galton-Watson branching process, with offspring distribution Y n at generation n, as the 
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discrete time stochastic process {Z n }'^L with Z = 1 and 

Z n+1 = J2w ni , (2.8) 

i=i 

where W n i are i.i.d. distributed like Y n . The value Z n is the size of the n th generation of a 
population which begins with a single individual at time 0, where each member of gener- 
ation n gives rise to offspring for the next generation with distribution Y n , independently 
of all the other members. Letting be the generating function of Z n , and g n be the 
generating function of Y n , we have the well known relation 

g {n Hs)= gi (g 2 (---g n (s))), (2.9) 

which can be verified by induction. A quantity of major interest is the probability that 
the n th generation is extinct 

P(Z n = 0)=g^(0) = q n . (2.10) 

Since Z n = implies Z n+ i = 0, we have < q~i < q 2 < . . ., and thus lim n _ ) . 00 q n = tv < 1 
exists. The limit tv is the probability of eventual extinction. Furthermore, it is easily seen 
that EZ n , the expected size of generation n, equals Y[j=i EYj- This follows by computing 
[g in) ]'(l) in (2.9) and using ^(1) = 1. 

When all the Y n have identical distributions with generating function g, 

g (1) (s)=g(s), g {n+1 \s)=g(g^(s)) n = l,2,..., (2.11) 

and we denote q n = P(Z n = 0) = g( n \0) : and lim^^oo q n = it. As is well known (see e.g. 
Karlin and Taylor (1975), Chapter 8) in this instance % is the smallest root of the equation 

g(s) = s. (2.12) 

The value s = 1 is always a root of (2.12), and it is the smallest root if and only if EY < 1 
(the subcritical case) , or EY = 1 (the critical case) . There is positive probability of never 
becoming extinct, that is, of having n < 1, iff EY > 1 (the supercritical case). 
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3. Connection. Our main result in the present section is to exhibit the connection 
between Optimal Stopping and homogeneous Galton- Watson Processes. In particular we 
link the optimal stopping value V n to the extinction probability q n using the following 
Theorem. 

Theorem 3.1. Let Y e y have generating function g, and let n be the smallest root of 
the equation g(s) = s. Then the function F(x) given by 

{0 x < 

g'(x) 0<x<n (3.1) 
1 7T < X, 

is a distribution function. Let X have distribution (3.1), and h(a) = E[X V a]. Then 

h(a) = g(a) for < a < n. (3.2) 

Also 

EX = P(Y = 0), P(X = 0) = P(Y = 1) and when n = 1, P(X < 1) = EY. 

(3.3) 

The variable X has an atom of size P(Y = 1) at 0, an atom of size 1 — g'(7r) at n, and 
density g"{x) on (0, n). Exactly one distribution satisfies (3.2). 

Proof: The function g'(x) is non-negative and nondecreasing for < x < it. Note that 
g'(ir) < 1, since g is convex and s < g(s) for all < s < n. Further, by definition, for 
< a < it, 

CO IT 

h(o) = E[X V a] = J P([X V a] > x)dx = it - J g'(x)dx = ix - g{n) + g(a) = g(a), 

a 

which is (3.2). For a = 0, (3.2) yields h(0) = g(0), or EX = P(Y = 0), and P(X = 0) = 
g'(0) = P(Y = 1). When tt = 1, EY < 1 and (3.1) yields EY = g'(l) = F(l~) = P(X < 
I)- 

To show uniqueness, suppose (3.2) holds for some X* with distribution function F*. 

Since tt = g(n) = h(n) = E[X* V tt], it follows that P(X* > n) = 0, i.e. F*(x) = 1 for 

all x > tt. Also, since g is differentiable in < s < tt, so is h. But for < s < tt, h(s) = 
l 

E[X* V s] = 1 - / F*(x)dx = g(s), thus g'(s) = F*(s) for < s < n, and thus, by right 

s 

continuity, F*(x) = F(x) for all x. I 



Theorem 3.2. Let Z n be a Galton-Watson process with offspring distribution Y e y and 
extinction probability q n = P(Z n = 0). Let X±, . . . , X n be i.i.d. with distribution function 
(3.1), and let V n be its optimal stopping value. Then, 

V n = q n , n = l,2,... (3.4) 

Proof: For < a < n we have < g(a) < n. By (3.2) and induction, 

/i (n) (o) =9 (n) (a) for < a < n. (3.5) 

Using (2.6) and (2.10) and setting a = in (3.5) yields (3.4). I 
Remarks: 

3.1 Equality (3.2) cannot hold for n < a < 1 since in this interval g(a) < a, while 
h(a) = E[X V a] > a. 

3.2 The distribution of Y is uniquely determined by the sequence {q n }Ti since an 
analytic function g is uniquely determined by its values on an infinite sequence of values 
having a limit point. Thus there are no two different Vs with the same g n -sequence. 

3.3 In contrast to Remark 3.2, there are many different i.i.d. sequences of X's with 
values {V n }^°. For a construction, see Hill and Kertz (1982). 

3.4 We excluded from y the variables for which P(Y = 0) = 0. For such variables 
7T = is the smallest root of (2.12). Note that for this case F of (3.1) gives unit mass to 
0, thus (3.2) and (3.4) are formally true also for this case. 

3.5 Theorem 3.1 shows that for each Y E y there exists an X taking values in [0,1] 
such that (3.2) holds. However, it is not true that for each X taking values in [0,1] there 
exists a corresponding Y e y. Necessary and sufficient conditions for X to correspond to 
a Y G y is that X has a distribution function F of the form 

{0 x < 

k(x) < x < n (3.6) 
1 7T < X, 

for some < it < 1, and that 
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(i) k( ) has a power series expansion with all coefficients nonnegative, and (ii) There 

s 

exists a constant c > such that g(s) = J k{x)dx + c satisfies (a) g(l) = 1, (b) g(n) = n. 

o 

This fact suggests that it will be easier to use the correspondence to translate properties 
of optimal stopping into properties about Galton- Watson processes, than vice versa. 

4. Examples. 

The correspondence between Y and X of (3.1), yields some interesting relationships. 

Example 4.1: Y ~ B(p) Bernoulli. In this case P(Y = 1) = p = 1 - P(Y = 0) and 
clearly n = 1. As g(s) = (1 — p) + ps, F(s) = p for < s < 1, and F(l) = 1. Hence, 
X ~ -6(1 -p). 

Example 4.2: Y ~ mS(p), m > 2, that is, P(Y = m) = p = 1 - P(y = 0), y(s) = 
(1 — p) +ps m , and Fy = mp. Using (3.3), since P(Y = 1) = 0, X has no mass at zero, 
but has mass 1 — g'(n) = 1 — mp7r m_1 at n. Therefore, for < s < 7r, 

/ e \ m — 1 

F(s) = mpyr 7 "- 1 ^-J + 7(s = tt)(1 - mpyr 7 "" 1 ), 
that is, X is a mixture of max Ui, where Ui are i.i.d. U(0,ir), with probability 

i=l,-",m — 1 

mpn m ~ 1 , and a point mass at n with probability l — mpn m ~ l . In particular, for m = 2 
(corresponding to a splitting of a cell), in the critical case p = 1/2, X ~ £7(0,1) . For 
m = 2 and the supercritical case p > 1/2, the eventual extinction probability is the 
smallest solution to 1 —p+ps 2 — s = 0, which is ir = (1 —p)/p. Therefore, X is a mixture 
of £7(0,7r) variable with probability 2(1 — p) and a point mass at 7r = (1 — p)/p with 
probability 2p — 1. In the subcritical case p < 1/2, X is a mixture of a uniform £7(0, 1) 
variable with probability 2p, and point mass at 1 with probability 1 — 2p. 

Example 4.3: Y ~ "P(A), Y is Poisson with parameter A, and g(s) = e A(s_1) . For A > 1, 
7T < 1 is the smallest root of e A ( s_1 ) = s; for A < 1, 7r = 1. The distribution function of X 
is 

{0 x < 

Ae A(x-l) < X < 7T 
1 X > 7T. 

Example 4.4: Y ~ ££(&, c), Generalized Geometric distribution: P(Y = k) = 6c fc_1 , k = 

oo 

1, 2, . . . and P(Y = 0) = 1 - £ = (1 - 6 - c)/(l - c), for any 6, c > such that 

fe=i 
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6 + c < 1. The standard geometric distribution Q(p) with success probability p E (0, 1), 
p + q = 1, is the special case QQ{pq, q). Here g{s) = P(Y = 0) + bs/(l — as) and can be 
written as 

g(s) = (a + 0s)/(j + 6s) (4.1) 

with 

a = l- (b + c), P = b-c(l-c), 7 = l-c, S=-c(l-c). (4.2) 

This is (according to Athreya and Ney (1972, p. 6)) essentially the only nontrivial example 
where g^(s), and hence g^ n \0) = q n , can be computed explicitly. This example is also 
discussed in most other texts on branching processes, see e.g. Harris (1963, p. 9), and 
Karlin and Taylor (1975, p. 402). See also the continuation of this example in Example 
6.2, below. Since EY = 6/(1 — c) 2 it follows easily that for b > (1 — c) 2 the eventual 
extinction probability is n = [1 — (b + c)]/c(l — c) = —a/5. In all other cases it = 1. Here 
X has c.d.f. 

{0 x < 

6/(1 -ex) 2 0<x<n (4.3) 
1 X > 7T. 

5. Convergence rates of the extinction probabilities for the Galton- Watson 
process. 

The purpose of the present section is not to derive new results, but rather to show 
how well-known results in branching theory have simple proofs by means of stopping rules. 
We do not strive for the most far-reaching results, and are content with obtaining rates 
for which q n — > n. 

Theorem 5.1. 

(a) Supercritical case: If EY > 1 then it < 1 and 

0<n-q n <n[g'(n)} n . (5.1) 

(b) Subcritical case: If EY < 1 (and P(Y = 0) < 1), then n = 1 and 

0<l-q n < [EY] n , (5.2) 

and the inequality on the right in (5.2) is strict if and only if P(Y < 1) < 1. 

10 



(c) Critical case: If EY = 1, Var(Y) = a 2 < oo then 



lim n[l - g'(q n )} = 2, 



(5.3) 



n 



or equivalently, 



lim n(l 



?n) = 2/a 2 . 



(5.4) 



n 



oc 



More generally, if EY = 1 and 



lim (l-s)g"(s)/[l-g'(s)} 



= a 



(5.5) 



for some < a < 1, then 



lim n[l - g'(q n )] = 1 + a 



-l 



(5.6) 



PROOF: (a) According to Theorem 3.1, for X corresponding to Y, P(X = ir) = 1 — g'in), 
which is positive. Now consider the suboptimal stopping rule t which stops at the smallest 
i for which Xi = n, and if no such i exists, stops at time n anyway. Since this rule is 
suboptimal, EX t , the expected value to the statistician using rule t, is at most V n , but 
is greater than n times the probability that the value n will be observed, since stopping 
at t = n with some value smaller than n will still yield a positive expected return. The 
probability of never observing a value n is [g'{'n)] n - Thus n(l — [g'(^)] n ) < EX t < V n = q n , 
from which (5.1) follows. 

(b) The proof of (b) is essentially the same as (a), using ir = 1, and P(X = 1) = 1 — ^'(l) = 
1 — EY. Equality in (5.2) holds if and only if the "suboptimal" rule t is actually optimal. 
This happens if and only if X is Bernoulli. This case is described in Example 4.1, where 
P(Y < 1) = 1, and by the uniqueness of X, as stated in Theorem 3.1, this is the only case. 

(c) We shall draw on the results of Kennedy and Kertz (1991), who show that the asymp- 
totic behavior of the value sequence V n for optimal stopping of i.i.d. random variables 
depends on to which extremal distribution domain X belongs. In the present case, X has 
no mass at 1, is bounded above by 1, has distribution function g'(x), and the non-zero 
density g"(x) for < x < 1. In terms of the given c.d.f. and density, condition (5.5) 
is equivalent to the condition for a Type III extreme value distribution given in Theorem 
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1.6.1. of Leadbetter, Lindgren and Rootzen (1983) (See also e.g. de Haan (1976), Theorem 
4 and the remark which follows). Theorem 1.1 of Kennedy and Kertz (1991) now yields 
(5.6). Note that when Var(y) = a 2 < oo then g"(l) = o -2 , (since EY = 1), and the value 
of the limit in (5.5) is necessarily 1. Thus (5.3) is the particular case of (5.6) with a = 1. 
Note that by convexity the value in the left hand side of (5.5) for every fixed s is necessarily 
less than 1, and hence only a- values less than or equal to one can be obtained as limits in 
(5.5). 

To see that (5.3) is equivalent to (5.4), note that since q n — > 1 and lim n ^ 00 (l — 
fi , '(9n))/(l - Qn) = o 2 we obtain lim^oo n(l - g'{q n )) = lim^^ n(l - q n )a 2 . | 
Remarks: 

5.1 Standard proofs of various parts of Theorem 5.1 can be found in most standard 
texts in Branching processes. 

5.2 We see that the convergence of q n to n is at a geometric rate in both the supercrit- 
ical and subcritical cases. It is at the order of 0(l/n) in the critical case when Var(y) < oo, 
but 1 — q n converges to zero faster when Var(l") = oo. 

5.3 The branching process with EY = 1 and Var(F) = oo is studied in Slack, (1968). 
Note that all values of a, < a < 1 can be attained as the limit in (5.5), as seen from the 
following 

Example 5.1: For < a < 1, and < c < 1/(1 + a), let Y have generating function 

g (s) = s + (1 - s) 1+a c. (5.7) 

It is easily seen that this corresponds to the distribution 

P(Y = 0) = c, P(Y = 1) = 1 - (1 + a)c 

P(Y = k) = {-l) k c jf (« - j)/k\, k = 2, 3, . . . (5 ' 8) 
j=-i 

(For a = 1 it follows that P(Y = k) = for k> 2). Since g'(l) = 1 it follows that EY = 1 
and easy arithmetic yields (5.5). Here (5.6) can be stated as 

lim n(l-q n ) a = (ca)' 1 (5.9) 
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and shows that q n tends to 1 faster, the smaller a. (Note that for a = 1 one has Var(F) = 
2c and (5.9) agrees with (5.4) in this case). 

5.4 Though in most natural situations the limit in (5.5) does exist, one can exhibit 
generating functions for which the limit in (5.5) fails to exist. One such construction is a 
function having a coefficient sequence which essentially alternates between the coefficient 
sequences of generating functions of the form (5.7) for two different values of a. 

6. Inhomogeneous branching processes. 

In this section we consider the inhomogeneous branching process, as presented in 
Section 2(b). Here the offspring distribution in generation i is Y i: where the Yi need not 
have identical distributions. To each there is a corresponding Xj defined through (3.1), 
where g there is replaced by gi, and tt by iTi (where TTi is the eventual extinction probability 
of an ordinary Galton- Watson process with fixed offspring distribution Fj.) Now consider 
an optimal stopping problem where Xi, . . . , X n are observed sequentially. From (2.2) and 
(2.3) it follows that the value V™ to the statistician, of this sequence is 

V 1 n = h 1 (h 2 (---h n (0))). (6.1) 

If we denote more generally 

h^(a) = h 1 (h 2 (---h n (a))) (6.2) 
then, using (2.9), we can generalize Theorem 3.1 and (3.5) as follows. 
Theorem 6.1. Suppose n t > n 2 > ■ ■ ■ > n n . Then 

h^(a) = g {n \a) for < a < 7r n , 

and thus also V™ = q n . 

The proof is straightforward and hence omitted. 

Inhomogeneous Galton- Watson processes have been studied quite extensively in the 
literature. The earlier references are Jagers (1974) and Jirina (1976). See also Section 3.5 
in Jagers (1975). One of the latest references we have come across is D'Souza (1995). See 
also all related references mentioned there. All papers deal with various aspects of the 
limiting value of Z n under different assumptions on the Y^s. The following theorem has a 
very simple "stopping rule" proof. 
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Theorem 6.2. Suppose i\i = tv for all i = 1, 2, • • •, and denote = P(Yj ^ 1). Then 

7T < 7T With 

OO 

7T = 7To if and only if = oo. (6.3) 

Proof: Let Xi correspond to Y~j through the relation (3.1). By Theorem 6.1, q n = V™, and 
hence Voo = lim V™ = lim q n = n. Since by (3.1) Xi < 7To, i = 1, 2, . . . ,, tt = V^, < 7ro. 

n^oo n^oo 

We will show that n = n if and only if for all < e < 7r , 

oo 

£[1 - ^(ttq - e)} = oo, (6.4) 

i=i 

and then prove that (6.4) is equivalent to the condition X] r * = 00 • Note that P(X^ > 
tvq — e) = 1 — ^(7T — e). Thus if (6.4) holds then P(Xi > n — e infinitely often) = 1. 
Hence, for the rule t = inf{i: X t > n — e}, we have P(t < oo) = 1, and the value for this 
rule, EX t , is at least 7r — £, and hence txq — e < < 7r . Since this is true for every 
e > it follows that n = = 7r . Conversely, if (6.4) fails for some < £o < then by 
the Borel-Cantelli lemma, P(Xi > ttq — Eq infinitely often) < 1, and hence there is positive 
probability of never seeing a value greater than ttq — s, thus the supremum of the expected 
return over all stopping rules is less than ttq, that is, tv < tvq. 

It remains to verify that (6.4) is equivalent to the condition J2 r i = °°- Let P(Yj = 
k) = Pik, k = 2, 3, — Then 

g'i(s) = l-r i + J2 k PikS k ~ 1 . 

k>2 

Note that for any k > 1, (n — e) k < (n — e)^ -1 = (1 — e/7r )7rQ , and also that g'^o) < 1; 
thus 

1 - 9i(^o - e) = r i ~ k Pik(^o - e) k ~ x > n - (1 - e/n ) ^ kPik^' 1 

k>2 k>2 (6.5) 

= n - (1 - e/TToHs^TTo) - 1 + r t ] > Ti - (1 - £/vro)r, = (e/ir )ri. 

Thus, if Yl r i = °°' (6-4) holds for every < e < 7r (and tv = tv ). On the other hand, 
from the first equality in (6.5) it follows that 1 — g^TTo — s) < fi- Thus J2 r i < 00 implies 
that the sum in (6.4) converges. I 

14 



oo 

Remark 6.1: Suppose EYi < 1 for all i, and - EYj] = oo. Since 1 - EYi < n, 

i=i 

oo 

Ti = oo and tv = 1 follows. 

i=i 

Since EY t < 1, one has P(X t = 1) = 1 - #'(1) = 1 - EY t . Thus the probability that 
Xi = 1 infinitely often, equals one, so the rule which stops for the smallest % for which 
= 1, stops with probability 1. Thus the value = 1 is, in this case, attainable by 
a stopping rule t with P(t < oo) = 1. In all other situations where = oo, the value 
Voo = 1 is not attainable, and only e-optimal stopping rules exist. 

Remark 6.2: Note that 7Tj = n for all i implies by (2.9) that g^^o) = n also. Thus 
unlike the situation in the homogeneous Galton- Watson process where lim g^ n \s) = n 
for < s < 7r, in the inhomogeneous case, it may happen that even though lim g( n \s) 
exists for all < s < 1, this limit need not equal n for < s < 7r , unless Yl r i = 00 • A 
similar remark is true also for the case ttq = 1. 

Example 6.1: Let Yi take the values 0, 1 and 2 only, with probabilities P{Y% = 0) = 
P(y. = 1) = 1-r, and P(Y; = 2) = 2r,/3, where < r» < 1. Here &(s) = r,/3 + (1 -r,)s + 
2riS 2 /3, and it is easily checked that 7Tj = 1/2 for all z. Note that here -EY^ = 1 + r^/3. 

Since EZ n = Y[ EYi, the condition = oo is equivalent to lim EZ n = oo. 

i=i n ^°° 

Example 6.2: As in Keiding and Nielsen (1975), let Yj have the Generalized Geometric 
distribution, QQ(bi,Ci), as described in Example 4.4; hence, Fj has generating function as 
in (4.1) 

gi(s) = + + 5 l s) (6.6) 

where the constants are defined as in (4.2). Then it can be verified by induction that 

g ( n )( s ) = ( a (») + /?(») s )/( 7 (») + 5^ n) s), (6.7) 

and the values of cr n \ (3^ n \ r )^ n > and #( n ) can be obtained explicitly. We shall consider in 
detail the case where all Yi are "critical", i.e. hi = (1 — c;) 2 . For this case let 

S[ n) =zZ c ^---^ ( 6 - 8 ) 

where the summation is over all 1 < k\ < ■ ■ • < ki < n. Set Sq = 1. Then one can verify 
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that 



n 



n 



a. 



(n) 



j 9(») = 5](-l)i(j + l)5j 



(») 



i=o 



(6.9) 



n 



7 



(n) 



1 + ^(-l^'-^j - l)sj n) , 5 (n) = -o>>. 



3=2 



Clearly here q n = aS n > /^ n \ It follows from (6.8) and (6.9) that for any n, all n\ permuta- 
tions of the order of the YiS yield the same distribution for the n th generation, Z n . Note 
that here P(Yi = 1) = 1 — = (1 — q) 2 which implies that ri = c^(2 — q). Thus, by 
Theorem 6.2, fr = 1 if and only if ^ q = oo. 

It is of interest to note that the permutation invariance mentioned above can be 
generalized. Let Y\ and Yi have generating function of the form (4.1). Then the generating 
function of Z 2 is (see (2.9)) 



and it can then be verified that <7i(<72(s)) = <72(<7i(s)) if and only if a\jb\ = a-ij^2 • But 
for Tii < 1 one has —ai/di = tt{, thus the order does not matter if and only if tt\ = 112- 
This generalizes immediately for composing n such generating functions, and shows that 
the order of the FjS does not matter if and only if all Hi = n < 1 in this case. We do not 
know if this property has been observed earlier. Translating to optimal stopping, we have 
obtained a sequence of non identically distributed variables for which the optimal stopping 
value is the same, no matter in which order the variables appear. 

It is easy to show, by working out the distribution of Zi in Example 6.1, that even 
though 7Ti = 7T2 = 1/2 there, the Y^s there do not have the permutation invariance property. 

7. Connections to Prophet Values and Prophet Inequalities. 

When not all 7^ are equal, or when the necessary condition of Theorem 6.2 fails, 
one may still obtain meaningful, though sometimes crude, lower and upper bounds on 
7r through the use of suboptimal stopping rules, the 'prophet' value and the 'prophet 
inequality.' If EX t is the value of any (optimal or suboptimal) stopping rule t, for the 
n-horizon case, then EX t < V n = q n < = n, and if EX t is the value of a suboptimal 
rule for the infinite horizon case, EX t < Vqo = 7f, yielding lower bounds on q n and tt. 



9i(92(s)) 



(7172 + <5ia 2 ) + (7i^2 + <5i/3 2 )s ' 
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Let V£ = J E(max(X 1 , ...,X n )) and V™ = lim^^ Vf. and V™ are called "prophet 
values" . The term "prophet value" stems from the fact that an individual with complete 
foresight of the future would simply select the largest X\ value in the sequence, and obtain 
the expected return V p , the "prophet value". The prophet values V™ and V p °° are usually 
much easier to compute than the optimal stopping value. Since the value of any stopping 
rule is necessarily less than or equal to that of the prophet, we have the upper bound 
Qn = V n < < Vp 00 and hence n < V p °°. In addition, the prophet value can also be 
used to obtain a lower bound on jr. It is well-known, (see e.g. Hill and Kertz (1981)) 
that for a sequence of nonnegative independent random variables V p < 2V n , and thus 
Vp/2 < V n = q n < tt serves as a lower bound on q n and jf. Letting n — > oo, we see that 
V p °° /2 is also a lower bound on jr. 

Example 7.1: Consider Example 6.1 with J2 r i < 00 • Since <^(0) = 1 — r^, and Hi = 1/2, 
the Xi corresponding to Y~j has mass 1 — Ti at zero and is bounded above by 1/2. Hence, the 
variable X* where P(X* = 0) = 1 — n and P(X* = 1/2) = is stochastically larger than 
Xi, and therefore the prophet value for the X* sequence is an upper bound on the prophet 
value for the Xi sequence. The prophet value for the X? -sequence is 1/2 the probability 
that any of the X* variables equals 1/2, i.e., (§)[1 — n^i(l — r i)\- To obtain a lower 
bound on n, consider the suboptimal rule which stops for the smallest i such that Xi > 0. 
It should be noted that since Yl r % < 00 > this rule does not stop with probability one unless 
Ti = 1 for some i. Even if V{ < 1 for all i the value of this "rule" equals the limit of the 
value of the rule t n which stops for the smallest i such that Xi > 0, and stops at time n if 
no positive Xi is observed up to and including time n. The conditional expected return for 
stopping at X^, given Xi > 0, is 1/3. Thus the value of this rule is (|)[1 — n^i(l ~ r i)\- A 
different lower bound can be obtained through the rule which stops for the smallest i such 
that Xi = 1/2, if such an i exists. Its expected return is (|)[1 — n^i(l — r i/3)]- Thus 

| oo j oo oo 

max{-[l - H (1 - ri/3)], -[1 - H (1 - n)]} <n< (1/2)[1 - H (1 - n)]. 

Z t = l O 1 = 1 1 = 1 

For example, if rj = + l) 2 we have 

ft(l - n) = lim f[ = I™ ( " + 2 \ = 1/2, 

11^ l > n^ooH (i + l)2 n^oo2 ?l + l ' 

i=l i=l V ' V ' 
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so that 1/6 < 7f < 1/4. (Recall that tt = 1/2). 

We have shown how the correspondence between Y and X can be used to obtain 
information about branching processes from computations involving an optimal stopping 
problem. The following theorem shows how the correspondence can be applied in the other 
direction. 

Theorem 7.1. Let Y G y with EY < 1, and let X be the corresponding random variable 
with distribution given in (3.1). With X±, . . . ,X n i.i.d. random variables distributed like 
X, let X* = max(X 1? . . . , X n ). Then X* corresponds to aY* G y, and the prophet value 
EX* can be computed using 

EX* n = P(K = 0). (7.1) 
PROOF: The distribution function of X* = max(Xi, . . . , X n ) is 

{0 x < 

[g'{x)] n 0<x<l (7.2) 
1 1 < x. 

Clearly k(x) = [g'(x)] n satisfies condition (i) of Remark 3.5. Now since g'(x) < g'(l) = 

l l 

EY < 1, [g'(x)] n < g'{x) for < x < 1 and J[g'(x)] n dx < J g'(x)dx < 1. Hence 

o o 

g*(s)= f k(x)dx+(l- [ k(x)dx) (7.3) 
Jo Jo 

further satisfies g*(0) > and g*(l) = 1. Since here ir = 1, condition (ii) (b) of Remark 
3.5, g(ir) = tt, is equivalent to (ii) (a). I 

Remark 7.1: If EY > 1 i.e. n < 1, then max(Xi, . . . , X n ) does not correspond to any 
Y* G y since the distribution corresponding to (7.2) for this case cannot satisfy (ii) (a) 
and (b) of Remark 3.5 simultaneously. 

Remark 7.2: When EY < 1, then EY* = dg*(x)/dx\ 1 = k(l) = [g'(l)] n = [EY] n . 

For the cases below which illustrate Theorem 7.1, the given g(x) is sufficiently unal- 
tered upon differentiation, taking powers, and integration that g*(x) of (7.3) correspon- 
donds to a variable Y* of the same 'type' as the original Y, with a mass at zero according 
to the constant term in (7.3). 
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Example 7.2: Let X be the variable corresponding to the Y of Example 4.2 withp < 1/m, 
where g'{x) = mpx m ~ 1 . Hence k(x) = (mpx m ~ 1 ) n for < x < 1 and hence 

g*{x) = (mp) n x n(m - 1)+1 /[n(m - 1) + 1] + (1 - (mp) n /[n(m - 1) + 1]), 

and the prophet value EX* = P(Y* = 0) = g*(0) = 1 - (mp) n /[n(m - 1) + 1]. Note that 
Y^ takes on only the two values and n(m — 1) + 1, and hence is of the same type as the 
original Y. 

Example 7.3: Let X correspond to a Poisson V(X) variable Y, as in Example 4.3, with 
A < 1. Then k(x) = (Ae A ( x " 1 )) n , so 

g *{x) = (X 1 - 1 /n)e nX ( x -V + (1 - A™" 1 /™), 

and hence Y* is a mixture of a Poisson V(n\) random variable with probability A n_1 /n, 
and the constant with probability (1 — X n ~ 1 /n). Thus the prophet value EX* can be 
computed by 

\ n — 1 

P(Y* = 0) = 1 - ^-(1 " e" nA ). 

Example 7.4: Let X have distribution (4.4) with p e [0, 1/2], q = 1 - p, b = pq, c = p, 
and 7r = 1. It follows that Y is geometric and g(x) = q/(l — px). Hence k(x) = 

(pq/ (1 — px) 2 ) n and we may write 



n-l 



(2n - l)g n_1 V 1 -paJ/ V (2n-l)g' 

Hence, Y^" is a mixture of a sum of 2n — 1 independent Q (p) variables, that is, a negative 
binomial, with probability (p/q) n ~ 1 /(2n — 1), and the constant with probability 1 — 
(p/g) n_1 /(2n — 1). Thus the prophet value EX* equals 

P(Y: = 0) = q n p n - 1 /(2n - 1) + (1 - (p/q))^ 1 /(2n - 1)). 

Remark 7.3: In a similar way it can also be shown that in the inhomogeneous case, 
when EYi < 1 for all % = l,...,n, the prophet variable X* = max(X[, . . . , X n ) again 
corresponds to a Y* e [V. 
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